Skip to content

Add gpt-5.4-codex to resolve_model_config.py#2376

Draft
juanmichelini wants to merge 4 commits intomainfrom
gpt-5.4-codex
Draft

Add gpt-5.4-codex to resolve_model_config.py#2376
juanmichelini wants to merge 4 commits intomainfrom
gpt-5.4-codex

Conversation

@juanmichelini
Copy link
Collaborator

@juanmichelini juanmichelini commented Mar 10, 2026

Summary

Adds the gpt-5.4-codex model to resolve_model_config.py with corresponding tests and heuristics.

Changes

  • Added gpt-5.4-codex to MODELS dictionary in resolve_model_config.py
  • Added gpt-5.4-codex to GPT-5 codex variants in model_prompt_spec.py
  • Added test_gpt_5_4_codex_config() test function
  • Added gpt-5.4-codex to reasoning effort test cases in test_model_features.py

Configuration

  • Model ID: gpt-5.4-codex
  • Display name: GPT-5.4 Codex
  • Provider: OpenAI (via litellm_proxy)

Integration Test Results

Tests will run in CI.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bca110a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bca110a-python \
  ghcr.io/openhands/agent-server:bca110a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bca110a-golang-amd64
ghcr.io/openhands/agent-server:bca110a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bca110a-golang-arm64
ghcr.io/openhands/agent-server:bca110a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bca110a-java-amd64
ghcr.io/openhands/agent-server:bca110a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bca110a-java-arm64
ghcr.io/openhands/agent-server:bca110a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bca110a-python-amd64
ghcr.io/openhands/agent-server:bca110a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:bca110a-python-arm64
ghcr.io/openhands/agent-server:bca110a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:bca110a-golang
ghcr.io/openhands/agent-server:bca110a-java
ghcr.io/openhands/agent-server:bca110a-python

About Multi-Architecture Support

  • Each variant tag (e.g., bca110a-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., bca110a-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-11 16:29:15,509", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.13.0 -> 1.13.0).

Breaking REST API changes detected compared to baseline release:
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()

Action log

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean addition following existing pattern.

This change is technically sound and follows the established pattern perfectly. However, since this modifies evaluation harness code (.github/run-eval/resolve_model_config.py), it should be reviewed by a human maintainer to confirm no unintended impact on benchmark results before merging.

@juanmichelini juanmichelini marked this pull request as ready for review March 10, 2026 15:00
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean addition following existing pattern.

This change is technically sound and follows the established pattern perfectly. However, since this modifies evaluation harness code (.github/run-eval/resolve_model_config.py), it should be reviewed by a human maintainer to confirm no unintended impact on benchmark results before merging.

@juanmichelini juanmichelini marked this pull request as draft March 10, 2026 15:02
@juanmichelini
Copy link
Collaborator Author

@OpenHands update PR to also add a test and add it to heuristics like this PR does #2374

@juanmichelini
Copy link
Collaborator Author

@OpenHands update PR #2376 to also add a test and add it to heuristics like this PR does #2374

@openhands-ai
Copy link

openhands-ai bot commented Mar 11, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

- Add gpt-5.4-codex to model variant patterns in model_prompt_spec.py
- Add test_gpt_5_4_codex_config() test for configuration validation
- Add gpt-5.4-codex to reasoning effort test cases

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Mar 11, 2026

Summary

I've successfully updated PR #2376 to add tests and heuristics for gpt-5.4-codex, following the pattern from PR #2374.

Changes Made (commit 79de66ee):

File Change
openhands-sdk/openhands/sdk/llm/utils/model_prompt_spec.py Added gpt-5.4-codex to the GPT-5 codex variant patterns
tests/github_workflows/test_resolve_model_config.py Added test_gpt_5_4_codex_config() test function
tests/sdk/llm/test_model_features.py Added gpt-5.4-codex to reasoning effort test cases

Checklist:

Note: gpt-5.4-codex doesn't need to be explicitly added to REASONING_EFFORT_MODELS in model_features.py because "gpt-5" is already in that list and uses substring matching (which covers all gpt-5.x variants including codex). The test case I added confirms this works correctly.

PR link: #2376

@github-actions
Copy link
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/utils
   model_prompt_spec.py38294%61, 81
TOTAL19913579170% 

@juanmichelini
Copy link
Collaborator Author

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 1
Timestamp: 2026-03-11 21:18:07 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5.4_codex 0.0% 0/16 2 18 $0.00 0

📋 Detailed Results

litellm_proxy_gpt_5.4_codex

  • Success Rate: 0.0% (0/16)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_gpt_5.4_codex_79de66e_gpt_5_4_codex_run_N18_20260311_211716
  • Skipped Tests: 2

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.
  • c01_thinking_block_condenser: Model litellm_proxy/gpt-5.4-codex does not support extended thinking or reasoning effort

Failed Tests:

  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=2673d622-affa-43bb-9ead-c2e7c11c3a68: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=8e420f7d-cbef-417d-a18f-5d2d8ac52f76: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c02_hard_context_reset: Test execution failed: Conversation run failed for id=a47c3dc0-b50d-423f-87e5-e15829ba0361: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=f67be3b3-0572-488b-ac85-7d52d4c6b679: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c05_size_condenser: Test execution failed: Conversation run failed for id=9e99335f-bb43-4c26-b8f6-246a135fd7d1: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=87f40d1c-9db1-4c18-8c7b-13fe612875dc: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=621671f1-320b-4040-9469-580fd18bb0ea: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=b0371d13-70d8-4096-9736-076631ae2940: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c03_delayed_condensation: Test execution failed: Conversation run failed for id=ab094867-4b4f-4cdd-82e3-dd26026df80c: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=a0ed80d1-4325-4308-8c4f-e6982a60218d: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b02_no_oververification: Test execution failed: Conversation run failed for id=e76bedbf-d8e2-491c-845a-74f29332120c: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=c8205f48-323c-4658-aaf9-3e05d0fd2636: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • c04_token_condenser: Test execution failed: Conversation run failed for id=1570eb24-b4a3-4fe5-9672-317f98ce4c4c: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b01_no_premature_implementation: Test execution failed: Conversation run failed for id=bd118ecb-cc54-43c7-be38-4636af510dc5: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=4e2957af-2895-4093-9e9d-c1ee7801abbf: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)
  • b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=183805e7-820a-466f-a1f6-024bfaa3af68: litellm.BadRequestError: {"error":{"message":"litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=gpt-5.4-codex\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: You passed in model=gpt-5.4-codex. There are no healthy deployments for this modelNo fallback model group found for original model_group=gpt-5.4-codex. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]","type":null,"param":null,"code":"400"}} (Cost: $0.00)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants